{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_05_3_keras_l1_l2.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# T81-558: Applications of Deep Neural Networks\n",
    "**Module 5: Regularization and Dropout**\n",
    "* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)\n",
    "* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Module 5 Material\n",
    "\n",
    "* Part 5.1: Part 5.1: Introduction to Regularization: Ridge and Lasso [[Video]](https://www.youtube.com/watch?v=jfgRtCYjoBs&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_05_1_reg_ridge_lasso.ipynb)\n",
    "* Part 5.2: Using K-Fold Cross Validation with Keras [[Video]](https://www.youtube.com/watch?v=maiQf8ray_s&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_05_2_kfold.ipynb)\n",
    "* **Part 5.3: Using L1 and L2 Regularization with Keras to Decrease Overfitting** [[Video]](https://www.youtube.com/watch?v=JEWzWv1fBFQ&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_05_3_keras_l1_l2.ipynb)\n",
    "* Part 5.4: Drop Out for Keras to Decrease Overfitting [[Video]](https://www.youtube.com/watch?v=bRyOi0L6Rs8&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_05_4_dropout.ipynb)\n",
    "* Part 5.5: Benchmarking Keras Deep Learning Regularization Techniques [[Video]](https://www.youtube.com/watch?v=1NLBwPumUAs&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_05_5_bootstrap.ipynb)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Google CoLab Instructions\n",
    "\n",
    "The following code ensures that Google CoLab is running the correct version of TensorFlow."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Note: not using Google CoLab\n"
     ]
    }
   ],
   "source": [
    "try:\n",
    "    %tensorflow_version 2.x\n",
    "    COLAB = True\n",
    "    print(\"Note: using Google CoLab\")\n",
    "except:\n",
    "    print(\"Note: not using Google CoLab\")\n",
    "    COLAB = False"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 5.3: L1 and L2 Regularization to Decrease Overfitting\n",
    "\n",
    "L1 and L2 regularization are two common regularization techniques that can reduce the effects of overfitting [[Cite:ng2004feature]](http://cseweb.ucsd.edu/~elkan/254spring05/Hammon.pdf). These algorithms can either work with an objective function or as a part of the backpropagation algorithm. In both cases, the regularization algorithm is attached to the training algorithm by adding an objective.  \n",
    "\n",
    "These algorithms work by adding a weight penalty to the neural network training. This penalty encourages the neural network to keep the weights to small values. Both L1 and L2 calculate this penalty differently. You can add this penalty calculation to the calculated gradients for gradient-descent-based algorithms, such as backpropagation. The penalty is negatively combined with the objective score for objective-function-based training, such as simulated annealing.\n",
    "\n",
    "Both L1 and L2 work differently in that they penalize the size of the weight. L2 will force the weights into a pattern similar to a Gaussian distribution; the L1 will force the weights into a pattern similar to a Laplace distribution, as demonstrated in Figure 5.L1L2.\n",
    "\n",
    "**Figure 5.L1L2: L1 vs L2**\n",
    "![L1 vs L2](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/class_9_l1_l2.png \"L1 vs L2\")\n",
    "\n",
    "As you can see, L1 algorithm is more tolerant of weights further from 0, whereas the L2 algorithm is less tolerant. We will highlight other important differences between L1 and L2 in the following sections. You also need to note that both L1 and L2 count their penalties based only on weights; they do not count penalties on bias values. Keras allows [l1/l2 to be directly added to your network](http://tensorlayer.readthedocs.io/en/stable/modules/cost.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from scipy.stats import zscore\n",
    "\n",
    "# Read the data set\n",
    "df = pd.read_csv(\n",
    "    \"https://data.heatonresearch.com/data/t81-558/jh-simple-dataset.csv\",\n",
    "    na_values=['NA','?'])\n",
    "\n",
    "# Generate dummies for job\n",
    "df = pd.concat([df,pd.get_dummies(df['job'],prefix=\"job\")],axis=1)\n",
    "df.drop('job', axis=1, inplace=True)\n",
    "\n",
    "# Generate dummies for area\n",
    "df = pd.concat([df,pd.get_dummies(df['area'],prefix=\"area\")],axis=1)\n",
    "df.drop('area', axis=1, inplace=True)\n",
    "\n",
    "# Missing values for income\n",
    "med = df['income'].median()\n",
    "df['income'] = df['income'].fillna(med)\n",
    "\n",
    "# Standardize ranges\n",
    "df['income'] = zscore(df['income'])\n",
    "df['aspect'] = zscore(df['aspect'])\n",
    "df['save_rate'] = zscore(df['save_rate'])\n",
    "df['age'] = zscore(df['age'])\n",
    "df['subscriptions'] = zscore(df['subscriptions'])\n",
    "\n",
    "# Convert to numpy - Classification\n",
    "x_columns = df.columns.drop('product').drop('id')\n",
    "x = df[x_columns].values\n",
    "dummies = pd.get_dummies(df['product']) # Classification\n",
    "products = dummies.columns\n",
    "y = dummies.values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now create a Keras network with L1 regression."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Fold #1\n",
      "Fold score (accuracy): 0.64\n",
      "Fold #2\n",
      "Fold score (accuracy): 0.6775\n",
      "Fold #3\n",
      "Fold score (accuracy): 0.6825\n",
      "Fold #4\n",
      "Fold score (accuracy): 0.6675\n",
      "Fold #5\n",
      "Fold score (accuracy): 0.645\n",
      "Final score (accuracy): 0.6625\n"
     ]
    }
   ],
   "source": [
    "import pandas as pd\n",
    "import os\n",
    "import numpy as np\n",
    "from sklearn import metrics\n",
    "from sklearn.model_selection import KFold\n",
    "from tensorflow.keras.models import Sequential\n",
    "from tensorflow.keras.layers import Dense, Activation\n",
    "from tensorflow.keras import regularizers\n",
    "\n",
    "# Cross-validate\n",
    "kf = KFold(5, shuffle=True, random_state=42)\n",
    "    \n",
    "oos_y = []\n",
    "oos_pred = []\n",
    "fold = 0\n",
    "\n",
    "for train, test in kf.split(x):\n",
    "    fold+=1\n",
    "    print(f\"Fold #{fold}\")\n",
    "        \n",
    "    x_train = x[train]\n",
    "    y_train = y[train]\n",
    "    x_test = x[test]\n",
    "    y_test = y[test]\n",
    "    \n",
    "    #kernel_regularizer=regularizers.l2(0.01),\n",
    "    \n",
    "    model = Sequential()\n",
    "    # Hidden 1\n",
    "    model.add(Dense(50, input_dim=x.shape[1], \n",
    "            activation='relu',\n",
    "             activity_regularizer=regularizers.l1(1e-4))) \n",
    "    # Hidden 2\n",
    "    model.add(Dense(25, activation='relu', \n",
    "                    activity_regularizer=regularizers.l1(1e-4))) \n",
    "     # Output\n",
    "    model.add(Dense(y.shape[1],activation='softmax'))\n",
    "    model.compile(loss='categorical_crossentropy', optimizer='adam')\n",
    "\n",
    "    model.fit(x_train,y_train,validation_data=(x_test,y_test),\n",
    "              verbose=0,epochs=500)\n",
    "    \n",
    "    pred = model.predict(x_test)\n",
    "    \n",
    "    oos_y.append(y_test)\n",
    "    # raw probabilities to chosen class (highest probability)\n",
    "    pred = np.argmax(pred,axis=1) \n",
    "    oos_pred.append(pred)        \n",
    "\n",
    "    # Measure this fold's accuracy\n",
    "    y_compare = np.argmax(y_test,axis=1) # For accuracy calculation\n",
    "    score = metrics.accuracy_score(y_compare, pred)\n",
    "    print(f\"Fold score (accuracy): {score}\")\n",
    "\n",
    "\n",
    "# Build the oos prediction list and calculate the error.\n",
    "oos_y = np.concatenate(oos_y)\n",
    "oos_pred = np.concatenate(oos_pred)\n",
    "oos_y_compare = np.argmax(oos_y,axis=1) # For accuracy calculation\n",
    "\n",
    "score = metrics.accuracy_score(oos_y_compare, oos_pred)\n",
    "print(f\"Final score (accuracy): {score}\")    \n",
    "    \n",
    "# Write the cross-validated prediction\n",
    "oos_y = pd.DataFrame(oos_y)\n",
    "oos_pred = pd.DataFrame(oos_pred)\n",
    "oosDF = pd.concat( [df, oos_y, oos_pred],axis=1 )\n",
    "#oosDF.to_csv(filename_write,index=False)\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3.9 (tensorflow)",
   "language": "python",
   "name": "tensorflow"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
